Demography is the study of human populations including the size, composition and distribution across space and the process through which populations change. Population stability or change depends on the 'Big Three' factors of demography- Births, Deaths and Migration.
For this project, data is taken from Gapminder dataset. Gapminder has a huge repository with demographic data on various indicators such as economy, health, population, education, energy, environment, infrastructure...etc with some having data ranging from the year 1800 to 2016. For the sake of simplicity, in this project we will only consider a subset of the data available in Gapminder.
The following data will be used in our analysis:
With these indicators, we can try to ask a few questions.
# importing modules used in the project
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# helps to create inline plots
%matplotlib inline
# set style of plot
sns.set(style='darkgrid')
#*************************************************#
# LOAD: load the Total Population data
#*************************************************#
total_population = pd.read_csv('population_total.csv')
total_population.info()
total_population.head()
As we can see, it has data for each country(as rows) from the year 1800 to 2018(years as columns). In order to perform analysis, we need to restructure the data. The below functions will help us in restructuring.
The function will take country and year as input and will populate 2 lists, one for country and another for year. The goal is to populate years (1800-2018) for each country
#------------------------------------------#
#
# Function: load_country_year
#
# Populates country_list and year_list from the inputs countries, years
# The goal is to populate years(1800-2018) for every country
#
# Args:
# (list) countries - list of country names
# (list) years - list of years
#
# Returns:
# (list) country_list - list of country names extended for every year
# (list) year_list - list of years extended for every country
#
#-------------------------------------------#
def load_country_year(countries,years):
country_list = []
year_list = []
for country in countries:
for year in years:
country_list.append(country)
year_list.append(year)
return country_list,year_list
The function will take country and dataframe as input and will populate a data list. The goal is to translate each country data into a list of values. The data for each country is just appended to the list
#------------------------------------------#
#
# Function: load_data
#
# Populates data_list from the inputs countries, dataframe
# The goal is to convert row values into a list
#
# Args:
# (list) countries - list of country names
# (dataframe) df - dataframe fome which data must be extracted
#
# Returns:
# (list) data_list - list of data points for each country
#
#-------------------------------------------#
def load_data(countries,df):
data_list = []
for country in countries:
data_list.extend(df.loc[country].tolist())
return data_list
We will get country names from 'geo' column. Then we will get years from column names. The column 'geo' is set as index because it will be easier to get data for each country when slicing
#*************************************************#
# EXTRACT: Get Country names list and Years list
# from Total Population dataset
#*************************************************#
country_population = total_population.geo
total_population.set_index('geo',inplace=True)
years_population = list(map(int,total_population.columns))
Here we are populating 3 lists- one for country, year and data(in this case- population) respectively.
#*************************************************#
# CONVERT: Data from Total Population dataset
# is converted to separate lists
#*************************************************#
country_population_list,years_population_list = load_country_year(country_population,years_population)
population_list = load_data(country_population,total_population)
# check length of the list created
len(country_population_list)
# check length of the list created
len(years_population_list)
# check length of the list created
len(population_list)
With each values separately created, let us combine them to a dataframe for the final structure
#*************************************************#
# CREATE: create dataframe with the Total Population data
#*************************************************#
population_df = pd.DataFrame({'country':country_population_list,'year':years_population_list})
population_df['population'] = population_list
population_df.info()
We can see that there are no null values in the dataset. We are good to use it for the analysis
population_df.head()
This is the structure, that we are trying to build for our dataframe. The same building process will be followed for other indicators such as urban population, water access...etc
Let us also load other indicators
Urban population refers to people living in urban areas as defined by national statistical offices
#*************************************************#
# LOAD: load the Urban Population data
#*************************************************#
urban_population = pd.read_csv('urban_population_percent_of_total.csv')
urban_population.describe()
#*************************************************#
# EXTRACT: Get Country names list and Years list
# from Urban Population dataset
#*************************************************#
country_urban = urban_population.geo
urban_population.set_index('geo',inplace=True)
years_urban = list(map(int,urban_population.columns))
#*************************************************#
# CONVERT: Data from Urban Population dataset
# is converted to separate lists
#*************************************************#
country_urban_list,years_urban_list = load_country_year(country_urban,years_urban)
urban_list = load_data(country_urban,urban_population)
#*************************************************#
# CREATE: create dataframe with the Urban Population data
#*************************************************#
urban_df = pd.DataFrame({'country':country_urban_list,'year':years_urban_list})
urban_df['urban_population_percent'] = urban_list
urban_df.info()
As we can see, there are some null values in the dataset. we will see how to handle them after loading other indicator datasets
urban_df.head()
The average number of years a child would live if current mortality patterns were to stay the same
#*************************************************#
# LOAD: load the Life Expectancy data
#*************************************************#
life_expectancy = pd.read_csv('life_expectancy_years.csv')
life_expectancy.describe()
#*************************************************#
# EXTRACT: Get Country names list and Years list
# from Life Expectancy dataset
#*************************************************#
country_life_exp = life_expectancy.geo
life_expectancy.set_index('geo',inplace=True)
years_life_exp = list(map(int,life_expectancy.columns))
#*************************************************#
# CONVERT: Data from Life Expectancy dataset
# is converted to separate lists
#*************************************************#
country_life_exp_list,years_life_exp_list = load_country_year(country_life_exp,years_life_exp)
life_exp_list = load_data(country_life_exp,life_expectancy)
#*************************************************#
# CREATE: create dataframe with the Life Expectancy data
#*************************************************#
life_exp_df = pd.DataFrame({'country':country_life_exp_list,'year':years_life_exp_list})
life_exp_df['life_expectancy_years'] = life_exp_list
life_exp_df.info()
life_exp_df.head()
#*************************************************#
# LOAD: load the Birth Rate data
#*************************************************#
crude_birth = pd.read_csv('crude_birth_rate_births_per_1000_population.csv')
crude_birth.describe()
#*************************************************#
# EXTRACT: Get Country names list and Years list
# from Birth Rate dataset
#*************************************************#
country_birth = crude_birth.geo
crude_birth.set_index('geo',inplace=True)
years_birth = list(map(int,crude_birth.columns))
#*************************************************#
# CONVERT: Data from Birth Rate dataset
# is converted to separate lists
#*************************************************#
country_birth_list,years_birth_list = load_country_year(country_birth,years_birth)
birth_list = load_data(country_birth,crude_birth)
#*************************************************#
# CREATE: create dataframe with the Birth Rate data
#*************************************************#
birth_rate_df = pd.DataFrame({'country':country_birth_list,'year':years_birth_list})
birth_rate_df['crude_birth_rate'] = birth_list
birth_rate_df.info()
birth_rate_df.head()
#*************************************************#
# LOAD: load the Death Rate data
#*************************************************#
crude_death = pd.read_csv('crude_death_rate_deaths_per_1000_population.csv')
crude_death.describe()
#*************************************************#
# EXTRACT: Get Country names list and Years list
# from Death Rate dataset
#*************************************************#
country_death = crude_death.geo
crude_death.set_index('geo',inplace=True)
years_death = list(map(int,crude_death.columns))
#*************************************************#
# CONVERT: Data from Death Rate dataset
# is converted to separate lists
#*************************************************#
country_death_list,years_death_list = load_country_year(country_death,years_death)
death_list = load_data(country_death,crude_death)
#*************************************************#
# CREATE: create dataframe with the Death Rate data
#*************************************************#
death_rate_df = pd.DataFrame({'country':country_death_list,'year':years_death_list})
death_rate_df['crude_death_rate'] = death_list
death_rate_df.info()
death_rate_df.head()
#*************************************************#
# LOAD: load the Child Mortality data
#*************************************************#
child_mortality = pd.read_csv('child_mortality_0_5_year_olds_dying_per_1000_born.csv')
child_mortality.describe()
#*************************************************#
# EXTRACT: Get Country names list and Years list
# from Child Mortality dataset
#*************************************************#
country_mortality = child_mortality.geo
child_mortality.set_index('geo',inplace=True)
years_mortality = list(map(int,child_mortality.columns))
#*************************************************#
# CONVERT: Data from Child Mortality dataset
# is converted to separate lists
#*************************************************#
country_mortality_list,years_mortality_list = load_country_year(country_mortality,years_mortality)
mortality_list = load_data(country_mortality,child_mortality)
#*************************************************#
# CREATE: create dataframe with the Child Mortality data
#*************************************************#
child_mortality_df = pd.DataFrame({'country':country_mortality_list,'year':years_mortality_list})
child_mortality_df['child_mortality_rate'] = mortality_list
child_mortality_df.info()
child_mortality_df.head()
Human Development Index is the index used to rank the countries by level of 'Human Development'. It has 3 dimensions: health level, education level, living standard level
#*************************************************#
# LOAD: load the Human Development Index data
#*************************************************#
hd_index = pd.read_csv('hdi_human_development_index.csv')
hd_index.describe()
#*************************************************#
# EXTRACT: Get Country names list and Years list
# from Human Development Index dataset
#*************************************************#
country_development = hd_index.geo
hd_index.set_index('geo',inplace=True)
years_development = list(map(int,hd_index.columns))
#*************************************************#
# CONVERT: Data from Human Development Index dataset
# is converted to separate lists
#*************************************************#
country_development_list,years_development_list = load_country_year(country_development,years_development)
development_list = load_data(country_development,hd_index)
#*************************************************#
# CREATE: create dataframe with the Human Development Index data
#*************************************************#
human_development_df = pd.DataFrame({'country':country_development_list,'year':years_development_list})
human_development_df['human_development_index'] = development_list
human_development_df.info()
human_development_df.head()
#*************************************************#
# LOAD: load the Median Age data
#*************************************************#
median_age = pd.read_csv('median_age_years.csv')
median_age.describe()
#*************************************************#
# EXTRACT: Get Country names list and Years list
# from Median Age dataset
#*************************************************#
country_median_age = median_age.geo
median_age.set_index('geo',inplace=True)
years_median_age = list(map(int,median_age.columns))
#*************************************************#
# CONVERT: Data from Median Age dataset
# is converted to separate lists
#*************************************************#
country_median_age_list,years_median_age_list = load_country_year(country_median_age,years_median_age)
median_age_list = load_data(country_median_age,median_age)
#*************************************************#
# CREATE: create dataframe with the Median Age data
#*************************************************#
median_age_df = pd.DataFrame({'country':country_median_age_list,'year':years_median_age_list})
median_age_df['median_age'] = median_age_list
median_age_df.info()
median_age_df.head()
Gross Domestic Product per person adjusted for difference in purchasing power(in international dollars, fixed 2011 prices)
#*************************************************#
# LOAD: load the Income data
#*************************************************#
income_per_person = pd.read_csv('income_per_person_gdppercapita_ppp_inflation_adjusted.csv')
income_per_person.describe()
#*************************************************#
# EXTRACT: Get Country names list and Years list
# from Income dataset
#*************************************************#
country_income = income_per_person.geo
income_per_person.set_index('geo',inplace=True)
years_income = list(map(int,income_per_person.columns))
#*************************************************#
# CONVERT: Data from Income dataset
# is converted to separate lists
#*************************************************#
country_income_list,years_income_list = load_country_year(country_income,years_income)
income_list = load_data(country_income,income_per_person)
#*************************************************#
# CREATE: create dataframe with the Income data
#*************************************************#
income_df = pd.DataFrame({'country':country_income_list,'year':years_income_list})
income_df['income_per_person'] = income_list
income_df.info()
income_df.head()
The percentage of people using atleast basic sanitation services, that is, improved sanitation facilities that are not shared with other households. Improved sanitation facilities include flush/pour flush to piped sewer systems, septic tanks or pit latrines; ventilated improved pit latrines, compositing toilets or pit latrines with slabs.
#*************************************************#
# LOAD: load the Sanitation Access data
#*************************************************#
basic_sanitation = pd.read_csv('at_least_basic_sanitation_overall_access_percent.csv')
basic_sanitation.describe()
#*************************************************#
# EXTRACT: Get Country names list and Years list
# from Sanitation Access dataset
#*************************************************#
country_sanitation = basic_sanitation.geo
basic_sanitation.set_index('geo',inplace=True)
years_sanitation = list(map(int,basic_sanitation.columns))
#*************************************************#
# CONVERT: Data from Sanitation Access dataset
# is converted to separate lists
#*************************************************#
country_sanitation_list,years_sanitation_list = load_country_year(country_sanitation,years_sanitation)
sanitation_list = load_data(country_sanitation,basic_sanitation)
#*************************************************#
# CREATE: create dataframe with the Sanitation Access data
#*************************************************#
sanitation_access_df = pd.DataFrame({'country':country_sanitation_list,'year':years_sanitation_list})
sanitation_access_df['sanitation_access_percent'] = sanitation_list
sanitation_access_df.info()
sanitation_access_df.head()
The percentage of people using atlest basic water services. Basic drinking water services is defined as drinking water from an improved source, provided collection time is not more than 30 minutes for a round trip. Improved water sources include piped water, boreholes or tubewells, protected dug wells, protected springs, and packaged or delivered water
#*************************************************#
# LOAD: load the Water Access data
#*************************************************#
basic_water = pd.read_csv('at_least_basic_water_source_overall_access_percent.csv')
basic_water.describe()
#*************************************************#
# EXTRACT: Get Country names list and Years list
# from Water Access dataset
#*************************************************#
country_water = basic_water.geo
basic_water.set_index('geo',inplace=True)
years_water = list(map(int,basic_water.columns))
#*************************************************#
# CONVERT: Data from Water Access dataset
# is converted to separate lists
#*************************************************#
country_water_list,years_water_list = load_country_year(country_water,years_water)
water_list = load_data(country_water,basic_water)
#*************************************************#
# CREATE: create dataframe with the Water Access data
#*************************************************#
water_access_df = pd.DataFrame({'country':country_water_list,'year':years_water_list})
water_access_df['water_access_percent'] = water_list
water_access_df.info()
water_access_df.head()
#*************************************************#
# LOAD: load the Internet Access data
#*************************************************#
internet_access = pd.read_csv('internet_users_percent.csv')
internet_access.describe()
#*************************************************#
# EXTRACT: Get Country names list and Years list
# from Internet Access dataset
#*************************************************#
country_internet = internet_access.geo
internet_access.set_index('geo',inplace=True)
years_internet = list(map(int,internet_access.columns))
#*************************************************#
# CONVERT: Data from Internet Access dataset
# is converted to separate lists
#*************************************************#
country_internet_list,years_internet_list = load_country_year(country_internet,years_internet)
internet_list = load_data(country_internet,internet_access)
#*************************************************#
# CREATE: create dataframe with the Internet Access data
#*************************************************#
internet_access_df = pd.DataFrame({'country':country_internet_list,'year':years_internet_list})
internet_access_df['internet_access_percent'] = internet_list
internet_access_df.info()
internet_access_df.tail()
#*************************************************#
# LOAD: load the Total Computer data
#*************************************************#
computer_access = pd.read_csv('personal_computers_total.csv')
computer_access.describe()
#*************************************************#
# EXTRACT: Get Country names list and Years list
# from Total Computer dataset
#*************************************************#
country_computer = computer_access.geo
computer_access.set_index('geo',inplace=True)
years_computer = list(map(int,computer_access.columns))
#*************************************************#
# CONVERT: Data from Total Computer dataset
# is converted to separate lists
#*************************************************#
country_computer_list,years_computer_list = load_country_year(country_computer,years_computer)
computer_list = load_data(country_computer,computer_access)
#*************************************************#
# CREATE: create dataframe with the Total Computer data
#*************************************************#
computer_access_df = pd.DataFrame({'country':country_computer_list,'year':years_computer_list})
computer_access_df['no_of_computers'] = computer_list
computer_access_df.info()
computer_access_df.tail()
Now that we have loaded our data, we must start with our cleaning process. Since the data is downloaded from Gapminder World, the data is clean without any duplicates or datatype issues. But we have to deal with missing values.
# count of nulls in the dataset
population_df.isnull().sum()
# count of nulls in the dataset
urban_df.isnull().sum()
Urban population dataset has 65 missing values. The countries for which we have missing values is below
# names of countries having null values
urban_df[urban_df.urban_population_percent.isnull()].country.unique()
We will fill the missing values with the nearest following value in the list grouped by country by using "back fill"
# fill missing values
urban_df = urban_df.groupby('country').apply(lambda x: x.fillna(method='bfill'))
# count of nulls in the dataset
urban_df.isnull().sum()
Still we are able to see 5 records with missing values. This is because, these records will be at the end for each country. Since the nearest following value is unavailable, it is not filled. we will use "forward fill" to fill these values with the nearest preceding value.
# fill missing values
urban_df = urban_df.groupby('country').apply(lambda x: x.fillna(method='ffill'))
# count of nulls in the dataset
urban_df.isnull().sum()
# count of nulls in the dataset
life_exp_df.isnull().sum()
We see that there are 516 missing values. We will use the same procedure/methodology used for Urban Population to fill the missing values for all the indicators
# names of countries having null values
life_exp_df[life_exp_df.life_expectancy_years.isnull()].country.unique()
# fill missing values
life_exp_df = life_exp_df.groupby('country').apply(lambda x: x.fillna(method='bfill'))
# count of nulls in the dataset
life_exp_df.isnull().sum()
# fill missing values
life_exp_df = life_exp_df.groupby('country').apply(lambda x: x.fillna(method='ffill'))
# count of nulls in the dataset
life_exp_df.isnull().sum()
# count of nulls in the dataset
birth_rate_df.isnull().sum()
# names of countries having null values
birth_rate_df[birth_rate_df.crude_birth_rate.isnull()].country.unique()
# counts of nulls for country Sudan
birth_rate_df[birth_rate_df.country=='South Sudan'].count()
# fill missing values
birth_rate_df = birth_rate_df.groupby('country').apply(lambda x: x.fillna(method='bfill'))
# count of nulls in the dataset
birth_rate_df.isnull().sum()
# count of nulls in the dataset
death_rate_df.isnull().sum()
# count of nulls in the dataset
child_mortality_df.isnull().sum()
# names of countries having null values
child_mortality_df[child_mortality_df.child_mortality_rate.isnull()].country.unique()
# fill missing values
child_mortality_df = child_mortality_df.groupby('country').apply(lambda x: x.fillna(method='bfill'))
# count of nulls in the dataset
child_mortality_df.isnull().sum()
# fill missing values
child_mortality_df = child_mortality_df.groupby('country').apply(lambda x: x.fillna(method='ffill'))
# count of nulls in the dataset
child_mortality_df.isnull().sum()
# count of nulls in the dataset
human_development_df.isnull().sum()
# names of countries having null values
human_development_df[human_development_df.human_development_index.isnull()].country.unique()
# fill missing values
human_development_df = human_development_df.groupby('country').apply(lambda x: x.fillna(method='bfill'))
# count of nulls in the dataset
human_development_df.isnull().sum()
# count of nulls in the dataset
median_age_df.isnull().sum()
# names of countries having null values
median_age_df[median_age_df.median_age.isnull()].country.unique()
# fill missing values
median_age_df = median_age_df.groupby('country').apply(lambda x: x.fillna(method='bfill'))
# count of nulls in the dataset
median_age_df.isnull().sum()
# count of nulls in the dataset
income_df.isnull().sum()
# count of nulls in the dataset
sanitation_access_df.isnull().sum()
# names of countries having null values
sanitation_access_df[sanitation_access_df.sanitation_access_percent.isnull()].country.unique()
# fill missing values
sanitation_access_df = sanitation_access_df.groupby('country').apply(lambda x: x.fillna(method='bfill'))
# count of nulls in the dataset
sanitation_access_df.isnull().sum()
# fill missing values
sanitation_access_df = sanitation_access_df.groupby('country').apply(lambda x: x.fillna(method='ffill'))
# count of nulls in the dataset
sanitation_access_df.isnull().sum()
# count of nulls in the dataset
water_access_df.isnull().sum()
# names of countries having null values
water_access_df[water_access_df.water_access_percent.isnull()].country.unique()
# fill missing values
water_access_df = water_access_df.groupby('country').apply(lambda x: x.fillna(method='bfill'))
# count of nulls in the dataset
water_access_df.isnull().sum()
# fill missing values
water_access_df = water_access_df.groupby('country').apply(lambda x: x.fillna(method='ffill'))
# count of nulls in the dataset
water_access_df.isnull().sum()
# count of nulls in the dataset
internet_access_df.isnull().sum()
# minimum year in internet dataset
internet_access_df.year.min()
In this case, we know that internet was a recent invention and hence populating the missing values by nearest following value will be absurd as internet was not in use in 1960. Hence we will resort to scalar value of zero '0' to fill the missing data
# fill missing values
internet_access_df.fillna(0,inplace=True)
# count of nulls in the dataset
internet_access_df.isnull().sum()
# count of nulls in the dataset
computer_access_df.isnull().sum()
The falls in the same category as internet. We will follow the same pattern.
# fill missing values
computer_access_df.fillna(0,inplace=True)
# count of nulls in the dataset
internet_access_df.isnull().sum()
We have cleaned the data. Let us also combine them together. It will help us with analysis.
#*************************************************#
# MERGE: Datasets are merged using pandas merge.
# Left join is performed on columns country & year
#*************************************************#
data_set = pd.merge(population_df,urban_df,how='left',on=['country','year'])
data_set = data_set.merge(life_exp_df,how='left',on=['country','year'])
data_set = data_set.merge(birth_rate_df,how='left',on=['country','year'])
data_set = data_set.merge(death_rate_df,how='left',on=['country','year'])
data_set = data_set.merge(child_mortality_df,how='left',on=['country','year'])
data_set = data_set.merge(human_development_df,how='left',on=['country','year'])
data_set = data_set.merge(median_age_df,how='left',on=['country','year'])
data_set = data_set.merge(income_df,how='left',on=['country','year'])
data_set = data_set.merge(sanitation_access_df,how='left',on=['country','year'])
data_set = data_set.merge(water_access_df,how='left',on=['country','year'])
data_set = data_set.merge(internet_access_df,how='left',on=['country','year'])
data_set = data_set.merge(computer_access_df,how='left',on=['country','year'])
data_set.info()
# calculate global population for each year
global_population = total_population.sum()
We sum the population data for each year. This gives us the global population for the year. We can then plot using the 'plot' function
# set plot size
plt.subplots(figsize=(8,6));
# plot global population
fig = global_population.plot(kind='line');
# set legend,title and labels
plt.legend(['Population'],fontsize='x-large',bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.);
plt.xlabel('Year',fontsize=15);
plt.ylabel('Population (in billions)',fontsize=15);
plt.title('Global Population Growth',fontsize=20);
# turn-off ticks in right and top axes
plt.tick_params(axis='both',which='both',top=False,right=False);
# pad spaces below title
ttl = fig.title;
ttl.set_position([.5, 1.05]);
Observation: we see that the global population is on a rising trend. Also, the rate of increase in population seems to be drastically rising from the year 1950
Let's now plot the population dataframe for all countries
# plot population data for each country since 1800
fig = population_df.plot(x='year',y='population',kind='line',figsize=(8,6));
# set legend,title and labels
plt.legend(['Population'],fontsize='x-large',bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.);
plt.xlabel('Year',fontsize=15);
plt.ylabel('Population (in billions)',fontsize=15);
plt.title('Population Growth by Countries',fontsize=20);
# turn-off ticks in right and top axes
plt.tick_params(axis='both',which='both',top=False,right=False);
# pad spaces below title
ttl = fig.title;
ttl.set_position([.5, 1.05]);
Observation: we are able to see that for few countries, the population is increasing drastically from the year 1950. Also, we are able to see that for some countries, the population seems to be decreasing
What we will do now, is create separate dataframes holding data for particular year. This will simplify the comparison for us
#*************************************************#
# get country population for specific years
#*************************************************#
# year: 1800
data_1800 = population_df[population_df.year==1800].loc[:,['country','population']].set_index('country')
# year: 1850
data_1850 = population_df[population_df.year==1850].loc[:,['country','population']].set_index('country')
# year: 1900
data_1900 = population_df[population_df.year==1900].loc[:,['country','population']].set_index('country')
# year: 1950
data_1950 = population_df[population_df.year==1950].loc[:,['country','population']].set_index('country')
# year: 2000
data_2000 = population_df[population_df.year==2000].loc[:,['country','population']].set_index('country')
# difference in population between 2000 & 1800
data_2000[(data_2000.population - data_1800.population)<0]
# difference in population between 2000 & 1900
data_2000[(data_2000.population - data_1900.population)<0]
# difference in population between 1950 & 1800
data_1950[(data_1950.population - data_1800.population)<0]
# difference in population between 1900 & 1800
data_1900[(data_1900.population - data_1800.population)<0]
From the above differences, we can see that countries - 'Holy See' and 'Ireland' seems to have decreasing population. Let us check them out
# plot 'Ireland' population data
fig = population_df[population_df.country == 'Ireland'].plot(x='year',y='population',kind='line',figsize=(8,6));
# set legend,title and labels
plt.legend(['Population'],fontsize='x-large',bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.);
plt.xlabel('Year',fontsize=15);
plt.ylabel('Population',fontsize=15);
plt.title('Population Growth of Ireland (since 1800)',fontsize=20);
# turn-off ticks in right and top axes
plt.tick_params(axis='both',which='both',top=False,right=False);
# pad spaces below title
ttl = fig.title;
ttl.set_position([.5, 1.05]);
'Ireland' seems to have decreased in population from around 1845 to 1960. After 1960, the population seems to be increasing.
# plot 'Holy See' population data
fig = population_df[population_df.country == 'Holy See'].plot(x='year',y='population',kind='line',figsize=(8,6));
# set legend,title and labels
plt.legend(['Population'],fontsize='x-large',bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.);
plt.xlabel('Year',fontsize=15);
plt.ylabel('Population',fontsize=15);
plt.title('Population Growth of Holy See (since 1800)',fontsize=20);
# turn-off ticks in right and top axes
plt.tick_params(axis='both',which='both',top=False,right=False);
# pad spaces below title
ttl = fig.title;
ttl.set_position([.5, 1.05]);
'Holy See' has a sudden reduction in population after 1960 but is slowly recovering
# difference in population between 2000 & 1800
diff_population = data_2000 - data_1800
We are calculating the difference in population for each country between the years 2000 and 1800. This will give us population growth for each country over 2 centuries. We will sort them by value. This will give us the list of countries that are sorted by their population growth
# sort the difference dataset by population in descending order
diff_population.sort_values('population',ascending=False,inplace=True)
# slice first 10 values in the dataset
diff_population[:10]
This is the list of Top 10 countries ranked by their population growth. Let us plot them too!
# plot top 10 countries by population growth
fig = diff_population[:10].plot(y='population',kind='bar',color='green',alpha=0.7,figsize=(10,6));
# set legend,title and labels
plt.legend(['Population'],fontsize='x-large',bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.);
plt.xlabel('Country',fontsize=15);
plt.ylabel('Population Growth since 1800(in billions)',fontsize=15);
plt.title('Top 10 Countries in Population Growth',fontsize=20);
# turn-off ticks in right and top axes
plt.tick_params(axis='both',which='both',top=False,right=False);
# pad spaces below title
ttl = fig.title;
ttl.set_position([.5, 1.05]);
Observation: There seems to be a huge gap between rank 2(India) and rank 3(US)
We will now check, how these 10 countries have grown since 1800 by plotting their values
# get the total population data since 1800, for top 10 population growth countries
top_10_country = population_df[population_df.country.isin(diff_population.index[:10].tolist())]
top_10_country.info()
# set plot size
plt.subplots(figsize=(8,6));
# plot the trend for top 10 population growth countries
fig = sns.lineplot(data=top_10_country,x='year',y='population', hue='country',palette='bright');
#set legend,label and title
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,fontsize='medium');
plt.xlabel('Year',fontsize=15);
plt.ylabel('Population (in billions)',fontsize=15);
plt.title('Top 10 Countries by Population',fontsize=18);
# turn-off ticks in right and top axes
plt.tick_params(axis='both',which='both',top=False,right=False);
# pad spaces below title
ttl = fig.title;
ttl.set_position([.5, 1.05]);
Observation: China and India seems to have exponential population growth since 1950. This data supports our initial observation that the rate of growth of global population has increased since 1950
Since we see a sudden change after 1950, let us check the countries growth between 1950 and 2000
# population difference between 1950 and 2000
diff_population = data_2000 - data_1950
diff_population.sort_values('population',ascending=False,inplace=True)
diff_population.head(10)
Observation: We can observe that United States and Indonesia have switched places. Also Russia and Japan are not in the top 10 list. Instead Mexico and Philippines have entered the top 10.
# plot top 10 population growth countries since 1950
fig = diff_population[:10].plot(y='population',kind='bar',color='orange',alpha=0.7,figsize=(8,6));
# set legend,label and title
plt.legend(['Population'],fontsize='x-large',bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.);
plt.xlabel('Country',fontsize=15);
plt.ylabel('Population Growth since 1950(in hundred millions)',fontsize=15);
plt.title('Top 10 Countries in Population Growth',fontsize=18);
# turn-off ticks in right and top axes
plt.tick_params(axis='both',which='both',top=False,right=False);
# pad spaces below title
ttl = fig.title;
ttl.set_position([.5, 1.05]);
Let's also plot the population data for these countries in the latest list
# get Total population data for the top 10 countries
top_10_country = population_df[population_df.country.isin(diff_population.index[:10].tolist())]
# set plot size
plt.subplots(figsize=(8,6));
# plot top 10 countries
fig = sns.lineplot(data=top_10_country,x='year',y='population', hue='country',palette='bright');
#set legend,label and title
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,fontsize='medium');
plt.xlabel('Year',fontsize=15);
plt.ylabel('Population (in billions)',fontsize=15);
plt.title('Top 10 Countries in Population Growth',fontsize=18);
# turn-off ticks in right and top axes
plt.tick_params(axis='both',which='both',top=False,right=False);
# pad spaces below title
ttl = fig.title;
ttl.set_position([.5, 1.05]);
diff_population.tail()
We see that 'Holy See' and 'St. Kitts and Nevis' have negative values which show population decrease between 2000 and 1950. Since we've already seen 'Holy See', let's check 'St. Kitts and Nevis'
# plot 'Holy See' population data
fig = population_df[population_df.country == 'St. Kitts and Nevis'].plot(x='year',y='population',kind='line',figsize=(8,6));
# set legend,title and labels
plt.legend(['Population'],fontsize='x-large',bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.);
plt.xlabel('Year',fontsize=15);
plt.ylabel('Population',fontsize=15);
plt.title('Population Growth of St. Kitts and Nevis (since 1800)',fontsize=20);
# turn-off ticks in right and top axes
plt.tick_params(axis='both',which='both',top=False,right=False);
# pad spaces below title
ttl = fig.title;
ttl.set_position([.5, 1.05]);
'St. Kitts and Nevis' has population decrease from around 1960 till 2000. But from 2000, the population is on the rising trend
Observation: The global population is on a rising trend. Individually, the trends of the countries also point the same.¶
We have seen that population is increasing. Our next question is related to spatial distribution of population. We need to identify if people are moving towards cities as population grows
# get minimum and maximum years from Urban Dataframe
print(urban_df.year.min())
print(urban_df.year.max())
#*************************************************#
# get country population for specific years
#*************************************************#
# year: 1960
data_1960 = urban_df[urban_df.year==1960].loc[:,['country','urban_population_percent']].set_index('country')
# year: 2016
data_2016 = urban_df[urban_df.year==2016].loc[:,['country','urban_population_percent']].set_index('country')
# population difference between 1960 and 2016
diff_urban_population = data_2016 - data_1960
data_2016.sort_values('urban_population_percent',ascending=False,inplace=True)
data_2016[:10]
This gives us the top 10 countries ranked by urban migration of people. The data is for the year 2016. Looks like in Monaco, Nauru and Singapore the total population is living in urban areas!
diff_urban_population.sort_values('urban_population_percent',ascending=False,inplace=True)
diff_urban_population[:10]
This is the list of top 10 countries ranked by urban population growth. Between 1960 and 2000, Gabon's urban population has grown from 17.4 to 87.4. It has the highest growth rate with a difference of 70. We will plot this data now.
# plot top 10 population growth countries since 1950
fig = diff_urban_population[:10].plot(y='urban_population_percent',kind='bar',color='blue',
alpha=0.7,ylim=(0,100),figsize=(8,6));
# set legend,label and title
plt.legend(['Urban Population %'],fontsize='x-large',bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.);
plt.xlabel('Country',fontsize=15);
plt.ylabel('Urban Population Growth % since 1960',fontsize=15);
plt.title('Top 10 Countries in Urban Population Growth %',fontsize=18);
# turn-off ticks in right and top axes
plt.tick_params(axis='both',which='both',top=False,right=False);
# pad spaces below title
ttl = fig.title;
ttl.set_position([.5, 1.05]);
top_urban_country = diff_urban_population.index[:10].tolist()
We will plot the data for the top 10 countries by urban population growth % below
# set plot size
plt.subplots(figsize=(8,6));
# plot the trend for top 10 population growth countries
fig = sns.lineplot(data=urban_df[urban_df.country.isin(top_urban_country)],x='year',y='urban_population_percent',
hue='country',palette='bright');
#set legend,label and title
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,fontsize='medium');
plt.xlabel('Year',fontsize=15);
plt.ylabel('Urban Population (%)',fontsize=15);
plt.title('Top 10 Countries by Urban Population Growth',fontsize=18);
# turn-off ticks in right and top axes
plt.tick_params(axis='both',which='both',top=False,right=False);
# pad spaces below title
ttl = fig.title;
ttl.set_position([.5, 1.05]);
Observation: The urban population growth % is on a rising trend for all the countries above
We will check the change/growth in total population using the population dataframe for the above countries
population_df[population_df.country=='South Korea'].max()
population_df[population_df.country=='Saudi Arabia'].max()
# set plot size
plt.subplots(figsize=(8,6));
# plot the trend for top 10 population growth countries
fig = sns.lineplot(data=population_df[population_df.country.isin(top_urban_country)],x='year',y='population',
hue='country',palette='bright');
#set legend,label and title
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,fontsize='medium');
plt.xlabel('Year',fontsize=15);
plt.ylabel('Population (in millions)',fontsize=15);
plt.title('Population Trend of Top 10 Countries in Urban Population Growth',fontsize=18);
# turn-off ticks in right and top axes
plt.tick_params(axis='both',which='both',top=False,right=False);
# set y tick-labels
fig.set_yticklabels([0,10,20,30,40,50,60]);
# pad spaces below title
ttl = fig.title;
ttl.set_position([.5, 1.05]);
Observation: Although 'South Korea' and 'Saudi Arabia' have rapid increase in population, their urban population growth does not reflect the same growth pattern.
From the above difference dataset we can also see some countries with negative values. We will take a look into them now.
decreasing_urban_list = diff_urban_population[diff_urban_population.urban_population_percent<0].index.tolist()
decreasing_urban_list
# set plot size
plt.subplots(figsize=(8,6));
# plot the trend for countries with decreasing urban population
fig = sns.lineplot(data=urban_df[urban_df.country.isin(decreasing_urban_list)],x='year',y='urban_population_percent',
hue='country',palette='bright');
#set legend,label and title
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,fontsize='medium');
plt.xlabel('Year',fontsize=15);
plt.ylabel('Urban Population (%)',fontsize=15);
plt.title('Urban Population Trend of Countries with Decreasing Urban Population',fontsize=18);
# turn-off ticks in right and top axes
plt.tick_params(axis='both',which='both',top=False,right=False);
# pad spaces below title
ttl = fig.title;
ttl.set_position([.5, 1.05]);
These countries have a decreasing urban population percent. May be people are leaving the country? We need to check in population dataframe
population_df[population_df.country == 'Tajikistan'].max()
# set plot size
plt.subplots(figsize=(8,6));
# plot the trend for countries with decreasing urban population
fig = sns.lineplot(data=population_df[population_df.country.isin(decreasing_urban_list)],x='year',y='population',
hue='country',palette='bright');
#set legend,label and title
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,fontsize='medium');
plt.xlabel('Year',fontsize=15);
plt.ylabel('Population (in millions)',fontsize=15);
plt.title('Total Population Trend of Countries with Decreasing Urban Population',fontsize=18);
# turn-off ticks in right and top axes
plt.tick_params(axis='both',which='both',top=False,right=False);
# pad spaces below title
ttl = fig.title;
ttl.set_position([.5, 1.05]);
Observation: Here we see that the countries with decreasing urban population, have an increasing total population.
Seems we are wrong in assuming people are leaving the country! NO! -That is surprising!!
Let's checkout how urban growth is in the top 10 countries by total population
# set plot size
plt.subplots(figsize=(8,6));
# plot the trend for countries with decreasing urban population
fig = sns.lineplot(data=urban_df[urban_df.country.isin(diff_population.index[:10].tolist())],
x='year',y='urban_population_percent',
hue='country',palette='bright');
#set legend,label and title
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,fontsize='medium');
plt.xlabel('Year',fontsize=15);
plt.ylabel('Urban Population (%)',fontsize=15);
plt.title('Urban Population Trend of Top 10 Countries in Total Population Growth',fontsize=18);
# turn-off ticks in right and top axes
plt.tick_params(axis='both',which='both',top=False,right=False);
# pad spaces below title
ttl = fig.title;
ttl.set_position([.5, 1.05]);
Except for 'Philippines', the countries seem to have an increasing urban growth %
Observation: Increase in Total Population of the country does not require increase in Urban Population. The relation between these two is less¶
fig = sns.jointplot(data=data_set, x = 'population', y = 'urban_population_percent',kind='reg');
This seems to prove our observation! There is no correlation between Total Population and Urban Population
Whew! Now to the next step. Is population growth related to change in birth and death rates? are they correlated in some way? Come on. Let's checkout.
# get minimum and maximum years from Birth Rate Dataframe
print(birth_rate_df.year.min())
print(birth_rate_df.year.max())
#*************************************************#
# get birth rate for specific years
#*************************************************#
# year: 1800
data_1800 = birth_rate_df[birth_rate_df.year==1800].loc[:,['country','crude_birth_rate']].set_index('country')
# year: 2015
data_2015 = birth_rate_df[birth_rate_df.year==2015].loc[:,['country','crude_birth_rate']].set_index('country')
data_2015.sort_values('crude_birth_rate',ascending=False,inplace=True)
data_2015[:10]
These are the top 10 countries by crude birth rate. The data is for the year 2015.
We will now find the difference of crude birth rates between 2015 and 1800 for each country
# birth rate difference between 1800 and 2000
diff_birth_rate = data_2015 - data_1800
diff_birth_rate.sort_values('crude_birth_rate',ascending=False,inplace=True)
diff_birth_rate[:10]
Surprising! Other than 'Chad' and 'Niger', all other countries have a negative birth rate?! This means that crude birth rate has been decreasing since 1800!
diff_birth_rate[diff_birth_rate.crude_birth_rate>0]
# plot top 10 population growth countries since 1950
fig = diff_birth_rate[:10].plot(kind='bar',alpha=0.7,figsize=(8,6));
# set legend,label and title
plt.legend(['Population'],fontsize='x-large',bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.);
plt.xlabel('Country',fontsize=15);
plt.ylabel('Crude Birth Rate Growth since 1800',fontsize=15);
plt.title('Top 10 Countries in Birth Rate Growth',fontsize=18);
# turn-off ticks in right and top axes
plt.tick_params(axis='both',which='both',top=False,right=False);
# pad spaces below title
ttl = fig.title;
ttl.set_position([.5, 1.05]);
birth_rate_country = diff_birth_rate[:10].index.tolist()
birth_rate_country
These are the top 10 countries by crude_birth_rate growth. We will try to plot their population growth from population dataframe
# set plot size
plt.subplots(figsize=(8,6));
# plot the trend for top 10 birth rate countries
fig = sns.lineplot(data=birth_rate_df[(birth_rate_df.country.isin(birth_rate_country))& (birth_rate_df.year>1945)],
x='year',y='crude_birth_rate', hue='country',palette='bright');
#set legend,label and title
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,fontsize='medium');
plt.xlabel('Year',fontsize=15);
plt.ylabel('Average No.of Births Per 1000 Population',fontsize=15);
plt.title('Birth Rate Trend of Top 10 Birth Rate Countries',fontsize=18);
# turn-off ticks in right and top axes
plt.tick_params(axis='both',which='both',top=False,right=False);
# pad spaces below title
ttl = fig.title;
ttl.set_position([.5, 1.05]);
population_df[population_df.country=='Congo, Dem. Rep.'].max()
population_df[population_df.country=='China'].max()
# set plot size
plt.subplots(figsize=(8,6));
# plot the trend for top 10 birth rate countries
fig = sns.lineplot(data=population_df[population_df.country.isin(birth_rate_country)],
x='year',y='population', hue='country',palette='bright');
#set legend,label and title
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,fontsize='medium');
plt.xlabel('Year',fontsize=15);
plt.ylabel('Population (in millions)',fontsize=15);
plt.title('Top 10 Birth Rate Countries - Total Population Trend',fontsize=18);
# turn-off ticks in right and top axes
plt.tick_params(axis='both',which='both',top=False,right=False);
# set y tick-labels
fig.set_yticklabels([0,10,20,30,40,50,60,70,80,90]);
# pad spaces below title
ttl = fig.title;
ttl.set_position([.5, 1.05]);
Oh..! Although these countries have a rising trend in population, their birth_rate seems to be dropping. Which means less children are born. Number of births is reducing! Hmmm...
Observation: crude_birth_rate is on a down trend for most of the countries!
We need to check how our top 10 population countries do in birth_rate.
# set plot size
plt.subplots(figsize=(8,6));
# plot the trend for top 10 birth rate countries
fig = sns.lineplot(data=birth_rate_df[birth_rate_df.country.isin(diff_population.index[:10].tolist())],
x='year',y='crude_birth_rate', hue='country',palette='bright');
#set legend,label and title
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,fontsize='medium');
plt.xlabel('Year',fontsize=15);
plt.ylabel('Average No.of Births Per 1000 Population',fontsize=15);
plt.title('Birth Rate Trend of Top 10 Total Population Countries',fontsize=18);
# turn-off ticks in right and top axes
plt.tick_params(axis='both',which='both',top=False,right=False);
# pad spaces below title
ttl = fig.title;
ttl.set_position([.5, 1.05]);
Even our top 10 countries by population have a down trend! We'll plot the difference in birth rate for these countries using our difference dataset.
# plot top 10 population growth countries since 1950
fig = diff_birth_rate.loc[diff_population.index[:10].tolist()].plot(kind='bar',figsize=(8,6));
# set legend,label and title
plt.legend(['Birth Rate'],fontsize='x-large',bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.);
plt.xlabel('Country',fontsize=15);
plt.ylabel('Crude Birth Rate Growth since 1800',fontsize=15);
plt.title('Birth Rate Growth of Top 10 Total Population Countries',fontsize=18);
# turn-off ticks in right and top axes
plt.tick_params(axis='both',which='both',top=False,right=False);
# pad spaces below title
ttl = fig.title;
ttl.set_position([.5, 1.05]);
United States has the highest change. birth_rate has reduced by a value of 35!
Observation: Most of the countries of the world have a down trend in birth_rate¶
Then what is causing the population rise? If birth rate is indeed reducing, how can population increase? is it because less people die now-a-days? Maybe! We have the data. That should help us.
# get minimum and maximum years from Death Rate Dataframe
print(death_rate_df.year.min())
print(death_rate_df.year.max())
#*************************************************#
# get death rate for specific years
#*************************************************#
# year: 1950
data_1950 = death_rate_df[death_rate_df.year==1950].loc[:,['country','crude_death_rate']].set_index('country')
# year: 2018
data_2018 = death_rate_df[death_rate_df.year==2018].loc[:,['country','crude_death_rate']].set_index('country')
data_2018.sort_values('crude_death_rate',ascending=False,inplace=True)
data_2018[-10:]
These guys above, have the least death rate - meaning - the number of people dying per 1000 is very less! They're the top 10 in our list. The data is for the year 2018.
# set plot size
plt.subplots(figsize=(8,6));
# plot the trend for top 10 death rate countries
fig = sns.lineplot(data=death_rate_df[death_rate_df.country.isin(data_2018[-10:].index.tolist())
& (death_rate_df.year<2019)],
x='year',y='crude_death_rate',hue='country',palette='bright');
#set legend,label and title
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,fontsize='medium');
plt.xlabel('Year',fontsize=15);
plt.ylabel('Average No.of Deaths Per 1000 Population',fontsize=15);
plt.title('Death Rate Trend of Top 10 Death Rate Countries',fontsize=18);
# turn-off ticks in right and top axes
plt.tick_params(axis='both',which='both',top=False,right=False);
# pad spaces below title
ttl = fig.title;
ttl.set_position([.5, 1.05]);
Wow! They're having drastic decrease in death rate. 'Oman' went from 30 to 5 in 40 years! How does their total population look like?
population_df[population_df.country=='Saudi Arabia'].max()
# set plot size
plt.subplots(figsize=(8,6));
# plot the trend for top 10 death rate countries
fig = sns.lineplot(data=population_df[population_df.country.isin(data_2018[-10:].index.tolist())],
x='year',y='population',hue='country',palette='bright');
#set legend,label and title
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,fontsize='medium');
plt.xlabel('Year',fontsize=15);
plt.ylabel('Population (in millions)',fontsize=15);
plt.title('Top 10 Death Rate Countries - Total Population Trend',fontsize=18);
# turn-off ticks in right and top axes
plt.tick_params(axis='both',which='both',top=False,right=False);
fig.set_yticklabels([0,5,10,15,20,25,30,35]);
# pad spaces below title
ttl = fig.title;
ttl.set_position([.5, 1.05]);
There you go! Now we know why population increases. It is partly due to new birth, but majorly due to decrease in death rate. People are trying to cheat death! Attain immortality! and now...this is the outcome, thanks to improved medical facilities(smile)
# set plot size
plt.subplots(figsize=(8,6));
# plot the trend for top 10 death rate countries
fig = sns.lineplot(data=birth_rate_df[birth_rate_df.country.isin(data_2018[-10:].index.tolist())
& (birth_rate_df.year>1930)],
x='year',y='crude_birth_rate',hue='country',palette='bright');
#set legend,label and title
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,fontsize='medium');
plt.xlabel('Year',fontsize=15);
plt.ylabel('Average No.of Births Per 1000 Population',fontsize=15);
plt.title('Birth Rate Trend of Top 10 Death Rate Countries',fontsize=18);
# turn-off ticks in right and top axes
plt.tick_params(axis='both',which='both',top=False,right=False);
# pad spaces below title
ttl = fig.title;
ttl.set_position([.5, 1.05]);
This is the birth rate trend for the top countries in death rate. Proves our point!
Now, lets check how the countries have grown in death rate between 1950 and 2018
# population difference between 1950 and 2018
diff_death_rate = data_2018 - data_1950
diff_death_rate.sort_values('crude_death_rate',ascending=False,inplace=True)
diff_death_rate[:10]
Wait!
These countries have increasing death rate between 1950 and 2018?!
# set plot size
plt.subplots(figsize=(8,6));
# plot the trend
fig = sns.lineplot(data=death_rate_df[death_rate_df.country.isin(diff_death_rate[:10].index.tolist())
& (death_rate_df.year<2019)],
x='year',y='crude_death_rate',hue='country',palette='bright');
#set legend,label and title
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,fontsize='medium');
plt.xlabel('Year',fontsize=15);
plt.ylabel('Average No.of Deaths Per 1000 Population',fontsize=15);
plt.title('Countries with Increasing Death Rate',fontsize=18);
# turn-off ticks in right and top axes
plt.tick_params(axis='both',which='both',top=False,right=False);
# pad spaces below title
ttl = fig.title;
ttl.set_position([.5, 1.05]);
Surprisingly, YES! These countries have an increasing death rate. We need to know how is their birth rate and population growth.
# set plot size
plt.subplots(figsize=(8,6));
# plot the trend
fig = sns.lineplot(data=birth_rate_df[birth_rate_df.country.isin(diff_death_rate[:10].index.tolist())
& (birth_rate_df.year>1930)],
x='year',y='crude_birth_rate',hue='country',palette='bright');
#set legend,label and title
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,fontsize='medium');
plt.xlabel('Year',fontsize=15);
plt.ylabel('Average No.of Births Per 1000 Population',fontsize=15);
plt.title('Birth Rate Trend of Countries with Increasing Death Rate',fontsize=18);
# turn-off ticks in right and top axes
plt.tick_params(axis='both',which='both',top=False,right=False);
# pad spaces below title
ttl = fig.title;
ttl.set_position([.5, 1.05]);
population_df[population_df.country=='Russia'].max()
# set plot size
plt.subplots(figsize=(8,6));
# plot the trend
fig = sns.lineplot(data=population_df[population_df.country.isin(diff_death_rate[:10].index.tolist())],
x='year',y='population',hue='country',palette='bright');
#set legend,label and title
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,fontsize='medium');
plt.xlabel('Year',fontsize=15);
plt.ylabel('Population (in millions)',fontsize=15);
plt.title('Total Population Trend of Countries with Increasing Death Rate',fontsize=18);
# turn-off ticks in right and top axes
plt.tick_params(axis='both',which='both',top=False,right=False);
fig.set_yticklabels([0,20,40,60,80,100,120,140,160]);
# pad spaces below title
ttl = fig.title;
ttl.set_position([.5, 1.05]);
Their birth rate is in a down trend. Also except for 'Russia','Japan' and 'Ukraine' all other countries have a fairly linear graph. Looks correct. Since more people die, the population rise is not that spectacular.
However, 'Russia' and 'Japan' have a drastic increase in population but their death rate is increasing and birth rate is decresing. This is strange. More people die, less people are born but population increases?
Maybe, because people from other countries are moving in. We cannot be sure, but maybe!
diff_death_rate[-10:]
These are our top 10 countries with drastic reduction in death rate! let's plot them.[whisper: OMAN is in the list]
# set plot size
plt.subplots(figsize=(8,6));
# plot the trend
fig = sns.lineplot(data=death_rate_df[death_rate_df.country.isin(diff_death_rate[-10:].index.tolist())
& (death_rate_df.year<2019)],
x='year',y='crude_death_rate',hue='country',palette='bright');
#set legend,label and title
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,fontsize='medium');
plt.xlabel('Year',fontsize=15);
plt.ylabel('Average No.of Deaths Per 1000 Population',fontsize=15);
plt.title('Top 10 Countries in Death Rate Growth',fontsize=18);
# turn-off ticks in right and top axes
plt.tick_params(axis='both',which='both',top=False,right=False);
# pad spaces below title
ttl = fig.title;
ttl.set_position([.5, 1.05]);
That is good. It is surprising that top world countries like US, UK or European countries are not in the top list. But most of the countries found here are Middle-East or Eastern countries. Seems, they got a lot of help from Western countries. Let's also check their birth rate and total population growth
# set plot size
plt.subplots(figsize=(8,6));
# plot the trend
fig = sns.lineplot(data=birth_rate_df[birth_rate_df.country.isin(diff_death_rate[-10:].index.tolist())
& (birth_rate_df.year>1940)],
x='year',y='crude_birth_rate',hue='country',palette='bright');
#set legend,label and title
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,fontsize='medium');
plt.xlabel('Year',fontsize=15);
plt.ylabel('Average No.of Births Per 1000 Population',fontsize=15);
plt.title('Birth Rate Trend of Top 10 Countries in Death Rate Growth',fontsize=18);
# turn-off ticks in right and top axes
plt.tick_params(axis='both',which='both',top=False,right=False);
# pad spaces below title
ttl = fig.title;
ttl.set_position([.5, 1.05]);
We can see that the reduction in death rate is greater than the reduction in birth rate in these countries, which results in increased population as seen below
population_df[population_df.country=='Iraq'].max()
# set plot size
plt.subplots(figsize=(8,6));
# plot the trend
fig = sns.lineplot(data=population_df[population_df.country.isin(diff_death_rate[-10:].index.tolist())],
x='year',y='population',hue='country',palette='bright');
#set legend,label and title
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,fontsize='medium');
plt.xlabel('Year',fontsize=15);
plt.ylabel('Population (in millions)',fontsize=15);
plt.title('Total Population Trend of Top 10 Countries in Death Rate Growth',fontsize=18);
# turn-off ticks in right and top axes
plt.tick_params(axis='both',which='both',top=False,right=False);
fig.set_yticklabels([0,5,10,15,20,25,30,35,40]);
# pad spaces below title
ttl = fig.title;
ttl.set_position([.5, 1.05]);
'Iraq' has less reduction in birth rate, there in proportionally increased population. 'Timor-Leste' has high reduction in birth rate, and proportionally the population stays same even with decreased death.
Let's not forget our top 10 population countries! We'll see their death rate now.
# set plot size
plt.subplots(figsize=(8,6));
# plot the trend
fig = sns.lineplot(data=death_rate_df[death_rate_df.country.isin(diff_population.index[:10].tolist())
& (death_rate_df.year<2019)],
x='year',y='crude_death_rate',hue='country',palette='bright');
#set legend,label and title
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,fontsize='medium');
plt.xlabel('Year',fontsize=15);
plt.ylabel('Average No.of Deaths Per 1000 Population',fontsize=15);
plt.title('Death Rate Trend of Top 10 Countries by Total Population',fontsize=18);
# turn-off ticks in right and top axes
plt.tick_params(axis='both',which='both',top=False,right=False);
# pad spaces below title
ttl = fig.title;
ttl.set_position([.5, 1.05]);
# plot top 10 population growth countries since 1950
fig = diff_death_rate.loc[diff_population.index[:10].tolist()].plot(kind='bar',figsize=(8,6));
# set legend,label and title
plt.legend(['Death Rate'],fontsize='x-large',bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.);
plt.xlabel('Country',fontsize=15);
plt.ylabel('Crude Death Rate Growth since 1800',fontsize=15);
plt.title('Death Rate Growth of Top 10 Total Population Countries',fontsize=18);
# turn-off ticks in right and top axes
plt.tick_params(axis='both',which='both',top=False,right=False);
# pad spaces below title
ttl = fig.title;
ttl.set_position([.5, 1.05]);
As expected, their death rate is decresing drastically. Especially for 'Pakistan','Bangladesh' and 'India'.
Observation: Increase in population in various countries inspite of decreased number of births is due to rapid decrease in death rate¶
The 'Human Development Index' helps us rank countries based on 'Human Development'. The Index is based on Health, Education & Living Standard.
# get minimum and maximum years from Human Development Dataframe
print(human_development_df.year.min())
print(human_development_df.year.max())
#*************************************************#
# get human development for specific years
#*************************************************#
# year: 1990
data_1990 = human_development_df[human_development_df.year==1990].loc[:,['country','human_development_index']
].set_index('country')
# year: 2015
data_2015 = human_development_df[human_development_df.year==2015].loc[:,['country','human_development_index']
].set_index('country')
data_2015.sort_values('human_development_index',ascending=False,inplace=True)
data_2015[:10]
These are the top countries by HD index. The data is for the year 2015. Come let's plot for these countries.
# set plot size
plt.subplots(figsize=(8,6));
# plot the trend
fig = sns.lineplot(data=human_development_df[human_development_df.country.isin(data_2015[:10].index.tolist())],
x='year',y='human_development_index',hue='country',palette='bright');
#set legend,label and title
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,fontsize='medium');
plt.xlabel('Year',fontsize=15);
plt.ylabel('Human Development Index',fontsize=15);
plt.title('Top 10 Countries in Human Development Index',fontsize=18);
# turn-off ticks in right and top axes
plt.tick_params(axis='both',which='both',top=False,right=False);
# pad spaces below title
ttl = fig.title;
ttl.set_position([.5, 1.05]);
population_df[population_df.country=='Germany'].max()
# set plot size
plt.subplots(figsize=(8,6));
# plot the trend
fig = sns.lineplot(data=population_df[population_df.country.isin(data_2015[:10].index.tolist())],
x='year',y='population',hue='country',palette='bright');
#set legend,label and title
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,fontsize='medium');
plt.xlabel('Year',fontsize=15);
plt.ylabel('Population (in millions)',fontsize=15);
plt.title('Total Population Trend of Top 10 Countries in Human Development Index',fontsize=18);
# turn-off ticks in right and top axes
plt.tick_params(axis='both',which='both',top=False,right=False);
fig.set_yticklabels([0,10,20,30,40,50,60,70,80,90]);
# pad spaces below title
ttl = fig.title;
ttl.set_position([.5, 1.05]);
# set plot size
plt.subplots(figsize=(8,6));
# plot the trend
fig = sns.lineplot(data=death_rate_df[death_rate_df.country.isin(data_2015[:10].index.tolist())
& (death_rate_df.year<2019)],
x='year',y='crude_death_rate',hue='country',palette='bright');
#set legend,label and title
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,fontsize='medium');
plt.xlabel('Year',fontsize=15);
plt.ylabel('Average No.of Deaths Per 1000 Population',fontsize=15);
plt.title('Death Rate Trend of Top 10 Countries in Human Development Index',fontsize=18);
# turn-off ticks in right and top axes
plt.tick_params(axis='both',which='both',top=False,right=False);
# pad spaces below title
ttl = fig.title;
ttl.set_position([.5, 1.05]);
# set plot size
plt.subplots(figsize=(8,6));
# plot the trend
fig = sns.lineplot(data=birth_rate_df[birth_rate_df.country.isin(data_2015[:10].index.tolist())
& (birth_rate_df.year>1940)],
x='year',y='crude_birth_rate',hue='country',palette='bright');
#set legend,label and title
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,fontsize='medium');
plt.xlabel('Year',fontsize=15);
plt.ylabel('Average No.of Births Per 1000 Population',fontsize=15);
plt.title('Birth Rate Trend of Top 10 Countries in Human Development Index',fontsize=18);
# turn-off ticks in right and top axes
plt.tick_params(axis='both',which='both',top=False,right=False);
# pad spaces below title
ttl = fig.title;
ttl.set_position([.5, 1.05]);
Lets find the rate of growth for each country by taking the difference in HD index between 1990 and 2015
# population difference between 1990 and 2015
diff_hd_index = data_2015 - data_1990
diff_hd_index.sort_values('human_development_index',ascending=False,inplace=True)
diff_hd_index[:10]
These are the top 10 countries that have improved in HD index in 25 years.Let's plot their trend
# set plot size
plt.subplots(figsize=(8,6));
# plot the trend
fig = sns.lineplot(data=human_development_df[human_development_df.country.isin(diff_hd_index[:10].index.tolist())],
x='year',y='human_development_index',hue='country',palette='bright');
#set legend,label and title
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,fontsize='medium');
plt.xlabel('Year',fontsize=15);
plt.ylabel('Human Development Index',fontsize=15);
plt.title('Top 10 Countries in Human Development Index Growth',fontsize=18);
# turn-off ticks in right and top axes
plt.tick_params(axis='both',which='both',top=False,right=False);
# pad spaces below title
ttl = fig.title;
ttl.set_position([.5, 1.05]);
HD index is on a rising trend for these countries.
population_df[population_df.country=='India'].max()
# set plot size
plt.subplots(figsize=(8,6));
# plot the trend
fig = sns.lineplot(data=population_df[population_df.country.isin(diff_hd_index[:10].index.tolist())],
x='year',y='population',hue='country',palette='bright');
#set legend,label and title
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,fontsize='medium');
plt.xlabel('Year',fontsize=15);
plt.ylabel('Population (in billions)',fontsize=15);
plt.title('Total Population Trend of Top 10 Countries in Human Development Index Growth',fontsize=18);
# turn-off ticks in right and top axes
plt.tick_params(axis='both',which='both',top=False,right=False);
# pad spaces below title
ttl = fig.title;
ttl.set_position([.5, 1.05]);
# set plot size
plt.subplots(figsize=(8,6));
# plot the trend
fig = sns.lineplot(data=death_rate_df[death_rate_df.country.isin(diff_hd_index[:10].index.tolist())
& (death_rate_df.year<2019)],
x='year',y='crude_death_rate',hue='country',palette='bright');
#set legend,label and title
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,fontsize='medium');
plt.xlabel('Year',fontsize=15);
plt.ylabel('Average No.of Deaths Per 1000 Population',fontsize=15);
plt.title('Death Rate Trend of Top 10 Countries by Human Development Index',fontsize=18);
# turn-off ticks in right and top axes
plt.tick_params(axis='both',which='both',top=False,right=False);
# pad spaces below title
ttl = fig.title;
ttl.set_position([.5, 1.05]);
Observation: There's a sudden peak in number of deaths for Cambodia and Rwanda between 1970 to 1985 and 1985 to 2000 respectively. Each is a duration of 15 years. Must be something!
# set plot size
plt.subplots(figsize=(8,6));
# plot the trend
fig = sns.lineplot(data=birth_rate_df[birth_rate_df.country.isin(diff_hd_index[:10].index.tolist())
& (birth_rate_df.year>1940)],
x='year',y='crude_birth_rate',hue='country',palette='bright');
#set legend,label and title
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,fontsize='medium');
plt.xlabel('Year',fontsize=15);
plt.ylabel('Average No.of Births Per 1000 Population',fontsize=15);
plt.title('Birth Rate Trend of Top 10 Countries in Human Development Index',fontsize=18);
# turn-off ticks in right and top axes
plt.tick_params(axis='both',which='both',top=False,right=False);
# pad spaces below title
ttl = fig.title;
ttl.set_position([.5, 1.05]);
diff_hd_index[-10:]
These are the last 10 in HD index growth in 25 years. It is surprising that 3 countries have reduced in HD index. We'll plot them now.
# set plot size
plt.subplots(figsize=(8,6));
# plot the trend
fig = sns.lineplot(data=human_development_df[human_development_df.country.isin(diff_hd_index[-10:].index.tolist())],
x='year',y='human_development_index',hue='country',palette='bright');
#set legend,label and title
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,fontsize='medium');
plt.xlabel('Year',fontsize=15);
plt.ylabel('Human Development Index',fontsize=15);
plt.title('Countries with Least Human Development Index Growth',fontsize=18);
# turn-off ticks in right and top axes
plt.tick_params(axis='both',which='both',top=False,right=False);
# pad spaces below title
ttl = fig.title;
ttl.set_position([.5, 1.05]);
We can see that for these countries, the index is fairly linear without much change.
population_df[population_df.country=='Syria'].max()
# set plot size
plt.subplots(figsize=(8,6));
# plot the trend
fig = sns.lineplot(data=population_df[population_df.country.isin(diff_hd_index[-10:].index.tolist())],
x='year',y='population',hue='country',palette='bright');
#set legend,label and title
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,fontsize='medium');
plt.xlabel('Year',fontsize=15);
plt.ylabel('Population (in millions)',fontsize=15);
plt.title('Total Population Trend of Least Human Development Index Growth Countries',fontsize=18);
# turn-off ticks in right and top axes
plt.tick_params(axis='both',which='both',top=False,right=False);
fig.set_yticklabels([0,5,10,15,20,25]);
# pad spaces below title
ttl = fig.title;
ttl.set_position([.5, 1.05]);
# set plot size
plt.subplots(figsize=(8,6));
# plot the trend
fig = sns.lineplot(data=death_rate_df[death_rate_df.country.isin(diff_hd_index[-10:].index.tolist())
& (death_rate_df.year<2019)],
x='year',y='crude_death_rate',hue='country',palette='bright');
#set legend,label and title
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,fontsize='medium');
plt.xlabel('Year',fontsize=15);
plt.ylabel('Average No.of Deaths Per 1000 Population',fontsize=15);
plt.title('Death Rate Trend of Least Human Development Index Growth Countries',fontsize=18);
# turn-off ticks in right and top axes
plt.tick_params(axis='both',which='both',top=False,right=False);
# pad spaces below title
ttl = fig.title;
ttl.set_position([.5, 1.05]);
# set plot size
plt.subplots(figsize=(8,6));
# plot the trend
fig = sns.lineplot(data=birth_rate_df[birth_rate_df.country.isin(diff_hd_index[-10:].index.tolist())
& (birth_rate_df.year>1940)],
x='year',y='crude_birth_rate',hue='country',palette='bright');
#set legend,label and title
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,fontsize='medium');
plt.xlabel('Year',fontsize=15);
plt.ylabel('Average No.of Births Per 1000 Population',fontsize=15);
plt.title('Birth Rate Trend of Least Human Development Index Growth Countries',fontsize=18);
# turn-off ticks in right and top axes
plt.tick_params(axis='both',which='both',top=False,right=False);
# pad spaces below title
ttl = fig.title;
ttl.set_position([.5, 1.05]);
Let's check index growth for top 10 countries by population
# plot top 10 population growth countries since 1950
fig = diff_hd_index.loc[diff_population.index[:10].tolist()].plot(kind='bar',figsize=(8,6));
# set legend,label and title
plt.legend(['Human Development Index'],fontsize='x-large',bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.);
plt.xlabel('Country',fontsize=15);
plt.ylabel('Human Development Index Growth since 1800',fontsize=15);
plt.title('Human Development Index Growth of Top 10 Total Population Countries',fontsize=18);
# turn-off ticks in right and top axes
plt.tick_params(axis='both',which='both',top=False,right=False);
# pad spaces below title
ttl = fig.title;
ttl.set_position([.5, 1.05]);
Since HD index is based on various factors, let's also check index with economy below
We have the data for income per person, which is in international dollars fixed 2011 prices and is adjusted for purchasing power.
# get minimum and maximum years from Income Dataframe
print(income_df.year.min())
print(income_df.year.max())
#*************************************************#
# get country population for specific years
#*************************************************#
# year: 1800
data_1800 = income_df[income_df.year==1800].loc[:,['country','income_per_person']].set_index('country')
# year: 2018
data_2018 = income_df[income_df.year==2018].loc[:,['country','income_per_person']].set_index('country')
data_2018.sort_values('income_per_person',ascending=False,inplace=True)
data_2018[:10]
These are the top 10 countries by income per person. The data is for the year 2018.
data_2015[:10]
We can already see that out of top 10 countries by HD index for the year 2015, 4 countries are in top 10 list for countries ranked by income per person.
We will now plot top countries by income for the year 2018.
# set plot size
plt.subplots(figsize=(8,6));
# plot the trend
fig = sns.lineplot(data=income_df[income_df.country.isin(data_2018[:10].index.tolist())],
x='year',y='income_per_person',hue='country',palette='bright');
#set legend,label and title
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,fontsize='medium');
plt.xlabel('Year',fontsize=15);
plt.ylabel('Income(2011-Intl. Dollars)',fontsize=15);
plt.title('Income Trend of Top 10 Countries by Income Per Person',fontsize=18);
# turn-off ticks in right and top axes
plt.tick_params(axis='both',which='both',top=False,right=False);
# pad spaces below title
ttl = fig.title;
ttl.set_position([.5, 1.05]);
# set plot size
plt.subplots(figsize=(8,6));
# plot the trend
fig = sns.lineplot(data=human_development_df[human_development_df.country.isin(data_2018[:10].index.tolist())],
x='year',y='human_development_index',hue='country',palette='bright');
#set legend,label and title
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,fontsize='medium');
plt.xlabel('Year',fontsize=15);
plt.ylabel('Human Development Index',fontsize=15);
plt.title('Human Development Index of Top 10 Income Countries',fontsize=18);
# turn-off ticks in right and top axes
plt.tick_params(axis='both',which='both',top=False,right=False);
# pad spaces below title
ttl = fig.title;
ttl.set_position([.5, 1.05]);
# income difference between 1800 and 2018
diff_income = data_2018 - data_1800
diff_income.sort_values('income_per_person',ascending=False,inplace=True)
diff_income[:10]
These are the top 10 countries by income growth since 1800
# set plot size
plt.subplots(figsize=(8,6));
# plot the trend
fig = sns.lineplot(data=income_df[income_df.country.isin(diff_income[:10].index.tolist())],
x='year',y='income_per_person',hue='country',palette='bright');
#set legend,label and title
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,fontsize='medium');
plt.xlabel('Year',fontsize=15);
plt.ylabel('Income(2011-Intl. Dollars)',fontsize=15);
plt.title('Income Trend of Top 10 Countries by Income Growth',fontsize=18);
# turn-off ticks in right and top axes
plt.tick_params(axis='both',which='both',top=False,right=False);
# pad spaces below title
ttl = fig.title;
ttl.set_position([.5, 1.05]);
We will also check how top 10 countries by population has performed
# plot top 10 population growth countries since 1950
fig = diff_income.loc[diff_population.index[:10].tolist()].plot(kind='bar',figsize=(8,6));
# set legend,label and title
plt.legend(['Income'],fontsize='x-large',bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.);
plt.xlabel('Country',fontsize=15);
plt.ylabel('Income Growth in Dollars since 1800',fontsize=15);
plt.title('Income Growth of Top 10 Countries by Total Population',fontsize=18);
# turn-off ticks in right and top axes
plt.tick_params(axis='both',which='both',top=False,right=False);
# pad spaces below title
ttl = fig.title;
ttl.set_position([.5, 1.05]);
Now that we have analysed our dataset for these indicators, lets also check their relationship
human_development_df.year.min()
sns.jointplot(data=data_set[data_set.year>1989], y = 'human_development_index',
x = 'income_per_person',kind='reg');
Observation: As expected, there is positive correlation between Income and Human Development Index
sns.jointplot(data=data_set[data_set.year>1989], y = 'human_development_index',
x = 'life_expectancy_years',kind='reg');
Observation: There is high positive correlation between Life Expectancy and Human Development Index
sns.jointplot(data=data_set[data_set.year>1989], y = 'human_development_index',
x = 'crude_death_rate',kind='reg');
Observation: There is negative correlation between Death Rate and Human Development Index
sns.jointplot(data=data_set[data_set.year>1989], y = 'human_development_index',
x = 'child_mortality_rate',kind='reg');
Observation: There is high negative correlation between Child Mortality Rate and Human Development Index
sns.jointplot(data=data_set[data_set.year>1989], x = 'population', y = 'human_development_index',kind='reg');
Observation: There is no correlation between Total Population and Human Development Index
sns.jointplot(data=data_set[data_set.year>1989], x = 'urban_population_percent',
y = 'human_development_index',kind='reg');
Observation: There is high positive correlation between Urban Population and Human Development Index
sns.jointplot(data=data_set[data_set.year>1989], x = 'sanitation_access_percent',
y = 'human_development_index',kind='reg');
Observation: There is high positive correlation between Sanitation Access and Human Development Index
sns.jointplot(data=data_set[data_set.year>1989], x = 'water_access_percent',
y = 'human_development_index',kind='reg');
Observation: There is high positive correlation between Water Access and Human Development Index
sns.jointplot(data=data_set[data_set.year>1989], x = 'internet_access_percent',
y = 'human_development_index',kind='reg');
Observation: There is positive correlation between Internet Access and Human Development Index
Observation 1 : The global population is on a rising trend. Individually, the trends of the countries also point the same.¶
Observation 2 : Increase in Total Population of the country does not require increase in Urban Population. The relation between these two is less¶
Observation 3 : Increase in population in various countries inspite of decreased number of births is due to rapid decrease in death rate¶
Observation 4 : There is high positive correlation between Human Development Index and various indicators such as Water Access, Sanitation Access, Internet Access, Urban Population %, Life Expectancy and Income¶
Observation 5 : There is high negative correlation between Human Development Index and Child Mortality, Death Rate¶
Observation 6 : There is no correlation between Human Development Index and Total Population Growth¶
1. NUMBER : The data used for analysis is not very exhaustive, meaning we cannot conclude results or relationships although observations can be made.
2. QUALITY : The data is cleaned for analysis by filling missing values or dropping them altogether. The outcome of this analysis is limited by the quality of data.
3. SOURCE : he data presented has been collected from various sources. The credibility of the sources and thereby the credibility of data used for analysis is limited due to unavailability of credible data.
from subprocess import call
call(['python', '-m', 'nbconvert', 'Investigate_a_Dataset.ipynb'])